Measuring Lexical Cohesion: Beyond Word Repetition

نویسندگان

  • Anna Kazantseva
  • Stan Szpakowicz
چکیده

This paper considers the problem of finding topical shifts in documents and in particular at what information can be leveraged to identify them. Recent research on topical segmentation usually assumes that topical shifts in discourse are signalled by changes in vocabulary. This information, however, is not always a sufficient indicator of a topical shift, especially for certain genres. This paper explores an additional source of information. Our hypothesis is that the type of a referring expression is an indicator of how accessible its antecedent is. The shorter and less informative the expression (e.g., a personal pronoun versus a lengthy post-modified noun phrase), the more accessible the antecedent is likely to be and the more likely it is that the topic under discussion has remained constant between the two mentions. We explore how this information can be used to augment a lexically-based topical segmenter. We test our hypothesis on two types of data, literary narratives and lecture notes. The results suggest that our similarity metric is useful: depending on the settings it either slightly improves the performance or leaves it unchanged. They also suggest that certain types of referring expressions are more useful than others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Cohesion and Literariness in Malcolm X's " The Ballot or the Bullet"

This paper unearths the contribution of lexical cohesion to the textuality and overall meaning of Malcolm X’s speech 'The Ballot or the Bullet'. Drawing on Halliday and Hasan’s (1976) and Hoey’s (1991) theory of cohesion, specifically lexical   cohesion, whose main thrust is the role of lexical items in not only contributing to meaning but also serving as cohesive ties, the paper discusses how ...

متن کامل

Lexical Cohesion in English and Persian Abstracts

This study compares and contrasts lexical cohesion in English and Persian abstracts of Iranian medical students’ theses to appreciate textualization processes in the two languages. For this purpose, one hundred English and Persian abstracts were selected randomly and analyzed based on Seddigh and Yarmohamadi’s (1996) lexical cohesion framework, a version of Halliday and Hasan’s (1976) and Halli...

متن کامل

Text Segmentation Using Reiteration and Collocation

A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, ...

متن کامل

Computing Lexical Cohesion as a Tool for Text Analysis

Recognizing coherent structure of a text is an essential task in natural language understanding. It is necessary, for example, to resolve anaphora, ellipsis, and ambiguity. One of the dominant factors of coherence of the text structure is lexical cohesion, namely the dependency relationship between words based on associative relations in common knowledge. This thesis proposes an objective and c...

متن کامل

Improving Text Segmentation with Non-systematic Semantic Relation

Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014